library(tidyverse)
options(scipen = 999)
data <- read.csv("data/rhc.csv", header=T)
# define exposure variable
data$A <- ifelse(data$swang1 =="No RHC", 0, 1)
# outcome is dth30, a binary outcome measuring survival status at day 30;
data$Y <- ifelse(data$dth30 =="No", 0, 1)Propensity Score Analysis with Machine Learning
HAD 7002H Causal Inference Spring/Summer 2024
Dataset - The Right Heart Catheterization
For this tutorial, we will be using the same right heart catheterization (RHC) dataset.
Data import and processing
Finalizing dataset for causal analysis
# we create our analysis data by removing variables with large proportion of missing;
# and variables not used in the analysis;
data2 <- select(data, -c(cat2, adld3p, urin1, swang1,
sadmdte, dschdte, dthdte, lstctdte, death, dth30,
surv2md1, das2d3pc, t3d30, ptid))
data2 <- rename(data2, id = X)Proposensity score analysis using machine learning technqiues
1 Super (Machine) Learning
Super learning can be used to obtain robust estimator. In a nut-shell it uses loss-function based ML tool and cross-validation to obtain the best prediction of model parameter of interest, based on a weighted average of a library of machine learning algorithms.
Guide to SuperLearner by Chris Kennedy at https://cran.r-project.org/web/packages/SuperLearner/vignettes/Guide-to-SuperLearner.html
New visual guide created by Katherine Hoffman